Skip to content

Harden OpenAIProvider readiness probe#109

Merged
chris-colinsky merged 2 commits into
mainfrom
fix/harden-openai-readiness-probe
Jun 1, 2026
Merged

Harden OpenAIProvider readiness probe#109
chris-colinsky merged 2 commits into
mainfrom
fix/harden-openai-readiness-probe

Conversation

@chris-colinsky

Copy link
Copy Markdown
Member

Summary

Add a readiness_probe constructor kwarg on OpenAIProvider accepting "models", "chat_completions", or "both", and flip the default from the older catalog-only GET /v1/models probe to a new POST /v1/chat/completions probe with max_tokens=1.

Motivation: OpenAI-compatible proxies can return 200 on the catalog endpoint while rejecting completions (Bifrost is the field-reported case), so the previous probe reported ready while every real call failed. The new default actually exercises the inference wire path, so that failure class surfaces at preflight rather than at first real call. Non-200 chat-probe responses route through classify_http_error so canonical error categories (ProviderAuthentication, ProviderUnavailable, ProviderInvalidModel, etc.) surface consistently regardless of which probe ran. Catalog-only behavior remains opt-in for cost-sensitive cloud callers via readiness_probe="models".

The conformance harness now picks readiness_probe mode from the fixture's mocked health_endpoint.path so fixture 007's catalog semantics keep working without spec changes. Per the §5 fixture's own comment, the probe shape is implementation-defined.

This is a breaking change at the default level (pre-1.0 per the project's phased rollout policy). Pre-flight behavior changes for callers who construct OpenAIProvider without naming readiness_probe.

Test plan

  • 10 new unit tests covering all three modes, the three motivating failure paths (Bifrost 405, network error, 503+model-not-loaded), and the both-mode catalog short-circuit
  • Conformance fixture 007 still green via the harness's path-based probe-mode dispatch
  • Full suite: 1075 pass, 210 skip, no regressions
  • CHANGELOG [Unreleased] entry under ### Changed (breaking)
  • docs/concepts/llms.md new "Pre-flight readiness check" section
  • docs/agent/non-obvious-shapes.md entry for the agent-facing surface
  • src/openarmature/AGENTS.md regenerated

Add a ``readiness_probe`` constructor kwarg accepting "models",
"chat_completions", or "both", and flip the default from the older
catalog-only ``GET /v1/models`` probe to a new ``POST /v1/chat/
completions`` probe with ``max_tokens=1``.

Motivation: OpenAI-compatible proxies can return 200 on the catalog
endpoint while rejecting completions (Bifrost is the field-reported
case), so the previous probe reported ready while every real call
failed. The new default actually exercises the inference wire path,
so that failure class surfaces at preflight. Non-200 chat-probe
responses route through ``classify_http_error`` so canonical error
categories surface consistently. Catalog-only behavior remains opt-in
for cost-sensitive cloud callers.

Conformance harness picks ``readiness_probe`` mode from the fixture's
mocked ``health_endpoint.path`` so fixture 007's catalog semantics
keep working without spec changes.
Copilot AI review requested due to automatic review settings June 1, 2026 15:40

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens OpenAIProvider.ready() by introducing a configurable readiness-probe mode and switching the default probe from the catalog endpoint (GET /v1/models) to an inference-path probe (POST /v1/chat/completions with max_tokens=1) to catch “catalog-green but inference-broken” proxies earlier.

Changes:

  • Add readiness_probe kwarg to OpenAIProvider with modes "chat_completions" (default), "models", and "both".
  • Update conformance harness to select probe mode based on fixture health_endpoint.path (keeping fixture 007 behavior stable).
  • Add unit tests plus documentation and changelog entries describing the default flip and opt-out path.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/unit/test_llm_provider.py Adds unit coverage for the three readiness probe modes and key failure cases.
tests/conformance/test_llm_provider.py Dispatches readiness probe mode based on fixture health_endpoint.path.
src/openarmature/llm/providers/openai.py Implements readiness_probe and adds a chat-completions readiness probe path.
src/openarmature/AGENTS.md Documents the new default readiness behavior and opt-out modes.
docs/concepts/llms.md Adds a “Pre-flight readiness check” section documenting probe modes and tradeoffs.
docs/agent/non-obvious-shapes.md Documents readiness probe behavior for agent-facing surfaces.
CHANGELOG.md Records the default readiness-probe flip as a breaking change under Unreleased.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/openarmature/llm/providers/openai.py
Comment thread src/openarmature/llm/providers/openai.py
Comment thread src/openarmature/llm/providers/openai.py
Three findings on PR #109:

1. Runtime guard for ``readiness_probe``. The Literal type is a static
   hint; an unknown string would silently no-op both dispatch branches
   in ``ready()`` and report ready. Validate in ``__init__`` against a
   module-level frozenset and raise ValueError.
2. Route ``_probe_models`` non-200 responses through
   ``classify_http_error``. Previously hard-coded 401/403 to
   ProviderAuthentication and everything-else to ProviderUnavailable,
   missing ProviderRateLimit (429), ProviderModelNotLoaded (503+marker),
   and ProviderInvalidModel (404+marker). The docstring's
   mode-independence claim is now true.
3. Validate ``_probe_chat_completions`` 200 response shape. A proxy
   answering 200 with an error payload or non-OpenAI-shape JSON
   previously passed the probe. Mirror ``_do_complete``'s parse +
   ``_parse_response(payload, None, None)`` step.

Adds five new tests covering: invalid mode at construction, catalog
probe 429 → ProviderRateLimit, catalog probe 503+marker →
ProviderModelNotLoaded, chat probe 200 with error payload, and chat
probe 200 with non-JSON body.
@chris-colinsky chris-colinsky merged commit 0661af0 into main Jun 1, 2026
6 checks passed
@chris-colinsky chris-colinsky deleted the fix/harden-openai-readiness-probe branch June 1, 2026 16:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants